m 



•3 



(N 
> 



co 



X 



Alternating minimal energy methods for linear 
systems in higher dimensions. Part II: Faster 
algorithm and application to nonsymmetric 

systems* 



Sergey V. Dolgov^ and Dmitry V. Savostyanov 



t 



O- April 11, 2013 

(NT 

a.- 
<: 

Abstract 

In this paper we accomplish the development of the fast rank-adaptive solver for 
tensor-structured symmetric positive definite linear systems in higher dimensions. 

<£ In |9] this problem is approached by alternating minimization of the energy function, 

which we combine with steps of the basis expansion in accordance with the steepest 
descent algorithm. In this paper we combine the same steps in such a way that the 
resulted algorithm works with one or two neighboring cores at a time. The recurrent 
interpretation of the algorithm allows to prove the global convergence and to estimate 
the convergence rate. We also propose several strategies, both rigorous and heuristic, 
to compute new subspaces for the basis enrichment in a more efficient way. We test the 
algorithm on a number of high-dimensional problems, including the non-symmetrical 
Fokker-Planck and chemical master equations, for which the efficiency of the method 
is not fully supported by the theory. In all examples we observe a convincing fast 
convergence and high efficiency of the proposed method. 

Keywords: high-dimensional problems, tensor train format, ALS, DMRG, steepest 

q \ descent, convergence rate, superfast algorithms. 



1 Introduction 



In this paper we develop the results of 0. We consider tensor-structured linear systems, 
which arise naturally from high-dimensional problems, e.g. PDEs. The number of un- 
knowns grows exponentially w.r.t. the number of dimensions d, which makes standard 
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algorithms inefficient even for moderate d. This problem is known as the curse of dimension- 
ality, and is attacked by different low-parametric approximations, e.g. sparse grids |32ll3H 
and tensor product methods |2"01IT81IT2"| . A particularly simple, elegant and efficient represen- 
tation of high-dimensional data is a linear tensor network, also called the matrix product 
states (MPS) and tensor train (TT) format. 

The MPS approach was originally proposed in the quantum physics community to 
represent the quantum states of many-body systems |[T0HT9f . This representation was re- 
discovered as the TT format by Oseledets and Tyrtyshnikov [24J, who were looking for a 
proper method to generalize a low-rank decomposition of matrices to high-dimensional 
arrays (tensors). The MPS approach came with the alternating least squares (ALS) and den- 
sity matrix renormalization group (DMRG) |j36ll26| algorithms for the ground state problem. 
The ALS considers the minimization of the Rayleigh quotient over the vectors with a fixed 
tensor structure, while DMRG does the same allowing the rank of the solution to change. 
Experiments from quantum physics point out that the convergence of the DMRG is usually 
notably fast, while the one of the ALS can be rather poor. 

The general numerical linear algebra context in which the TT format is introduced al- 
lows to think more widely about the power of tensor representations. For instance, we can 
apply DMRG-like techniques to high-dimensional problems other than just the ground 
state problem, e.g. interpolation of high-dimensional data [25ll3T| , solution of linear sys- 
tems |14ll8|, fast linear algebra in tensor formats [23J. We can also consider better alterna- 
tives to the DMRG, which follow the same alternating linear scheme (ALS) framework, but 
are numerically more efficient. A tempting goal is to obtain an algorithm which has the 
DMRG-like convergence and the ALS-like numerical complexity. In [9] we present such 
an algorithm for a solution of symmetric positive definite (SPD) linear systems in higher 
dimensions. 

The central idea in [9] is to support the alternating steps, i.e. optimization in a fixed 
tensor manifold, by steps which expand the basis in accordance with some classical iter- 
ative algorithms. A steepest descent (SD) algorithm is a natural choice for SPD problems. 
The enrichment step uses the essential information about the global residual of the large 
high-dimensional system on the local optimization step, that helps to escape the spuri- 
ous local minima introduced by the nonlinear tensor formulation and ensure the global 
convergence. The convergence rate of the whole method can then be established adapting 
a classical theory. In contrast, optimization in the fixed tensor manifolds can be analyzed 
via the Gauss-Seidel theory and only local convergence estimates are available [28J, which 
hold only in a (very) small vicinity of the exact soution. 

The global enrichment step used in algorithms "X + ALS(z)" and "ALS(t + z]" in [9j 
modifies all components of the tensor train format simultaneously. There is nothing par- 
ticularly wrong with this, but it is interesting to mix the same steps differently to obtain the 
algorithm which works with only one or two neighboring components at once, similarly 
to the DMRG technique. In this paper we develop such a method, namely the alternat- 
ing minimal energy (AMEn) algorithm. We prove the global convergence of AMEn and 
estimate the convergence rate w.r.t. the one of the steepest descent algorithm. We also 
propose several methods to compute the required local component of the global residual, 
using either the SVD-based approximation, or incomplete Cholesky decomposition, or 
low-rank ALS approximation. 

The rest of the paper is organized as follows. In Section |2] we introduce necessary 
definitions and notations. In Section |3] we propose the AMEn algorithm, then we compare 



it with similar algorithms from [91 and prove the convergence theorem. In Section |4] we 
discuss efficient methods to compute the required component of the residual. In Section [5] 
we test the algorithm on a number of high-dimensional problems, including the non- 
symmetrical Fokker-Planck and chemical master equations, for which the efficiency of 
the method is not fully supported by the theory. In all examples we observe a convincing 
fast convergence and high efficiency of the proposed method, as well as the advantages of 
the AMEn algorithm over the previously proposed ones. 

2 Notations and definitions 

This paper is based on the notations of 0, which we recall briefly here. 

We consider linear systems Ax = y in d-dimensional space, i.e. assume that a vector 
x has d indices ii , . . . , i d , and i k = 1 , . . . , n k , k = 1 , . . . , d. Such arrays are referred to as 
d-tensors x = [x(i! , . . . , t d )]. For the purposes of this paper it is convenient to consider a 
vectorization of a tensor 



x = vecx, x[Xi ...tdj =x(ii,...,x d J, 

where It ... id denotes a single index combined from ii , . . . , ijj. This index grouping is 
widely used throughout the paper. In the following we do not distinguish between 
x and x. 

The tensor train (TT) representation of x is written as the following multilinear mapn, 

X = T(X)=T(X^,...,X (d >), 

x(wQ = x£, ai (ii)x( 2 » a2 (i 2 ) . . .xLtll^ (id-i)xLl 1Ad (i d ), 

where i k are referred to as mode (physical) indices, a k = 1 , . . . , r k are the rank indices, 
X^ are the tensor train cores (TT-cores) and X = [X^\ . . . , X' d ^) denotes the tensor train. 
We follow the Einstein summation convention, which assumes a summation over every 
pair of repeated indices. All equations are supposed to hold for all possible values of free 
(unpared) indices. 

The t mapping is defined also for a subset of TT-cores (a subtrain) and maps it to the 
interface matrix of size n-i . . . u k x r k , defined as follows, 

X a =T(X^...,X< k »), 

X^iz.-.k, «o = xgdoxg^dz) . . .xi£_ 1|B ju 

and similarly for symbols X <k , X >k and X >k . For x given by ((!]) we have X {k} = X^ k X >k . 

x The multi-index can be defined following the big-endian convention ii . . .id = id + Ua-i — 1)n.d + 
. . . + (ii — 1 )n2 . . . Ud or little-endian convention ii . . . id = ii + (12 — 1 )ni + . . . + (id — 1 )ni . . . Ud-i . The 
big-endian notation is similar to numbers written in the positional system, while the little-endian notation 
is used in numerals in the Arabic scripts and is consistent with the Fortran style of indexing. The definition 
of the Kronecker (tensor) product <£> should be also consistent with the chosen endianness. The orthodox 
definition in linear algebra assumes the big-endianness, while the development of the efficient program 
code usually makes us think in the little-endian way. The rest of the paper can be read without a particular 
care of the endianness. It is enough to remember that z = x ® u means z(ij) = x(i)u(j). 

2 Note that t maps tensor train cores to a vectorized representation of a d-tensor, not to the d-tensor 
itself, cf. quantized tensor train (QTT) [17|. In this paper we do not distinguish between them and keep the 
notation simple. 



Note that the definition of t allows us to write 

x = t(X*\ X >k ) = t(X<\ X (k) , X >k ) = x(X <k , X (k) , X (k+1) , X >k+2 ), 

where two last mappings depict the decompositions used in ALS and DMRG algorithms 
proposed by S. White et al. fl3"6ll26| for the ground state problem in Quantum Physics. The 
original DMRG algorithm is formulated via the minimization of the Rayleigh quotient, 
where the heavily nonlinear high-dimensional optimization is reduced to the sequence of 
numerically tractable optimizations over the elements of each core. 

Similarly, we consider the solution of a linear equation Ax = y through the minimiza- 
tion of the energy function 

T(x) = ||x* — x|| A = (x, Ax) — 2![H(x,y) + const, (3) 

where x* = A _1 y is the exact solution, and ||u|| A = (u, u) A = (u, Au) denotes the A-norm 
of a vector u. Following the alternating linear scheme (ALS), the high-dimensional minimiza- 
tion is reduced to the minimization w.r.t. all cores one-by-one. Each local minimization is 
equivalent to the solution of a linear system, which is tractable due to a moderate size. The 
high-dimensional linear system can be split into a sequence of one-dimensional systems 
due to the linearity of the tensor train format t(X (1 ] , . . . , X (d) ) w.r.t. each TT-core X (k) . This 
linearity writes as the following matrix-by-vector product 

x = T (X) = T(X <k , X (k) , X >k ) = 0V k (X)x (k) , 

t (4) 

0V k (x)=x <k ®i nk ®(x >k ) T , 

where the vectorized TT-core x' k ' is a reshape of the three-dimensional array into a vector 

X M =vecX M xM(« k -iiic« k )=xW ti8k (i k ). (5) 

The elementwise definition of the frame matrix X^ k = CP^ k (X) is the following 

X^iT^TU, Ok-^ofc) = X£ J (i, ) . . . XSti!^, (i k _i )6(i k , h)^L (Wi ) • • • K d L (id). ( 6 ) 

where 5(1, j ) is the Kronecker symbol, i.e., 6(1, j) = 1 if i = j and 5(1, j ) = elsewhere. 
Similarly we define frame matrices CP^ k (X), CPj, k (X), 3\(X). For example, y<; k (X) writes 



X^ k = 3^ k (X) = X $k ® I 



X a (i! ... l d , a k j k+ i . . . j d ) = X' J (li ) . . . X£| i0Cv (l k )6(l k+1 , j k+1 ) . . . 5(l d , j 



n k+1 ...u d , 



(7) 



ai v > j a k _i ,<x k < 



dJ 



The A-orthogonal projector on the subspace span U is defined as follows 

R U = U(U* AUDITA. (8) 

It is easy to check that R^ = R u and R U U = U. Also, for any v such that (U, v) A = it holds 
Ruv = 0, hence the name A-orthogonal. 



3 Alternating minimal energy methods 

3.1 AMEn and ALS 

One of the main results of the previous paper [9] is the algorithm ALS(t+z) . Each iteration 
of this algorithm consists of one 'global' basis enrichment step which changes all the cores, 
followed by d update steps over all the cores subsequently. Classical optimization algo- 
rithms for tensor networks, e.g. ALS and DMRG, follow the alternating linear framework, 
i.e. update one or two neighboring cores at a time. We would like to keep the enrichment 
as well as other steps within the same idea, and propose another version of the method 
as Alg. [H which we will refer to as the alternating minimal energy algorithm (AMEn). The 
difference between two algorithms is illustrated by a simple three-dimensional example 
in Fig. [U 

To analyse the convergence of the ALS(t + z) algorithm, we can see it as a method 
which implements the (approximate) steepest descent step followed by a sequence of op- 
timization steps for the energy function. The approximate steepest descent step x = t + Hz 
with z ~ z = y — At and optimal h, gives [9, Thm. 1] the convergence rate Wi = w z + 0(e), 
where cu z denotes the progress of the exact steepest descent step. A fortiori, the global 
convergence of the ALS(t + z) is proven with the convergence rate not slower than cu 2 . The 
AMEn algorithm does not have a global enrichment step and the convergence can not be 
proven in one line. However, the convergence analysis is possible if Alg. [T]is seen as a re- 
current method. Though the theoretical estimates do not provide a clear distinction which 
method is preferable, in numerical experiments in Sec. [5] we will observe that the AMEn 
technique delivers more accurate solution, while the average convergence rate is almost 
the same as of the ALS ft + z] method. 
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Figure 1: Schematic representation of ALS(t+z) and AMEn algorithms 
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Figure 2: Schematic representation of the AMEn algorithm 



3.2 AMEn in two dimensions 

The idea of the convergence analysis is introduced by a two-dimensional example, see 
Fig. |2l In two dimensions Alg. [T]can be seen as a sequence of the following operations. 



1. Start from the initial guess 



[T ( 



] 



T (2) 



2. Update the first TT-core, minimizing the energy function over the entries of T^ 1 ' 



U( 1 »=argminj([T( 1 ) ] 



T (2) 



U 



[U' 



] 



T (2| 



3. Expand the basis in the first core using the first TT-core of the residual. 

y-Au = z^z = T(Z (1 \Z (2) ), u=[lP Z™] [T '' 

4. Perform the Galerkin correction step by minimizing the energy function over the 
bottom part of second TT-core. 



V t2) =argminj([U^ Z^] "^ V v = [U^ Z' 1 '] 



T (2) 

y(2) 



5. Minimize the energy function over all entries of the second TT-core 

X (2) =argminj([U( 1 ) Z^] X) , x = [ll™ Z^] X (2) 

A 

The ALS update steps 2 and 5 reduce the energy function by a factor p: 2 ^ 1 and \i\ ^ 1 , 
respectively, which can be rigorously estimated only locally, i.e. in a very small vicinity of a 
true solution. The basis enrichment step 3 does not provide any progress, because it does 
not change the solution vector, but only its TT-representation. The Galerkin correction 
step 4 does not technically present in the algorithm. If we omit it, the update step 5 will 
deliver the same TT-core, optimizing the energy function over the positions occupied by 
both upper and bottom parts of the second core. Without actually affecting the result of 
the computations, step 4 is essential to analyse the convergence of the whole method, since 
the progress of the Galerkin correction step can be estimated w.r.t. the one of the steepest 
descent. This idea is formally expressed as the following theorem. 



Theorem 1. In the notations set above for the two-dimensional linear system Ax = y one 
iteration of AMEn Alg.Q] provides the following progress 

J 1.7CJ ||X* ^||a _ II"* "^11 A II"* "II A _ 2 2 

T7+T ~~ li _ + ii2 = "ji _ + i|2 Ti ~ ijT = ^ cu ^' 
J [T-J II"* "''Ha II"* Ha II"* "^11 a 

M-<1, cux < co z < o> z = cu z + 0(e), 
where X = OMX) = X (1) x I n2 , Z = IPi(Z) = Z m x I n2 , and o> s is defined by 

2 -, ("*-u,R§(x*-u)) A /qx 

CUe = 1 " " . (9) 

(x*-u,x*-u) A 

Proof. The minimization in step 5 is written as xi = argmin x (2) J(Xx' 2 '), x' 2 ' = vecX' 2 ', 
cf. dU, and the gradient is zero when the Galerkin conditions (X* AX)xi = X*u are met. 
The solution after one AMEn iteration writes as follows 

x = Xxi 2) = X(X* AX)" 1 X* Ax* = R x x*, (10) 

where Rx is the A-orthogonal projector on X, cf. ©. Since u G span X we have 

n p w . J(") II"*-"IIa i ("*-U,Rx(x*-u)) A 2 m\ 

x*-x = (I-R x (x*-u), — - = ia =] 1 i = a 4- (11) 

J(uJ ||x*-u|| A (x* - It, x* - UJ A 

Similarly the progress of the Galerkin correction step 4 is estimated as follows (see H9J), 



J(v) _ ||x*-v||a _ _ (x„-u, R z (x*-u)) a _ iis2 
J(u) ||x*-u|| A (x* - u, x* - u) A 



v = (I-R z )(x,-u), ij-i = H? IM = 1 - ^_ ~>™~* -^ = 0,2, (12 ) 



It is not easy to estimate cux directly. However, since Z (1 ' is a part of X (1 ' , we have span Z c 
span X and therefore a)% ^ cu^. Similarly since z G span 2. it holds wz ^ tu z , where cu z 
denotes a progress of the perturbed steepest descent step. The final estimate for cu z is 
obtained in [9, Thm 1] with a precise derivation of the asymptotic 0(e) term. To finish the 
proof we note that J(u)/J(t) = \i 2 ^ 1 by construction of step 2. □ 

Remark 1. A convergence rate cu z of the steepest descent algorithm is estimated using the 
Kantorovich inequality as follows 

, Amax(A) — A min (A) r.r A ^ „ -, 
Amax(A) + A mi n(AJ 

where A min ( A) and A max ( A) denote the smallest and largest eigenvalues of A. For any w z we 
can choose such a threshold level e that u> z < 1 , which guarantees the global convergence 
of the AMEn algorithm. 



Algorithm 1 AMEn algorithm 



Require: System Ax = y in the TT-format, initial guess t = t(T) 
Ensure: Updated solution x = t(X) 
1: for k = 1 , . . . , d do {Cycle over TT-cores} 

Update U( k » = argmin s(k) J(T(X fl) , . . . , X^ 1 ', S (k) , T< k+1 \ . . . , T^)), 



For local problem (fTl]) approximate z k = u k — A k u k « £ k = t(Z k , . . . , Z k 



4: Expand the basis X' k ' 



u (k) z k 



(k] 



S (k+1) := 



k 





Recover the TT-orthogonality of X k = (X (1) , . . . , X (k \ S (k+1 \ T (k+2) , . . . , T (d) ). 
end for 
return x = t(X (1) , . . .,X (d) ) 



3.3 AMEn in higher dimensions 

In higher dimensions, the AMEn Alg. [I] can be described using the same scheme. We 
start from an initial guess t = t(T' 1 ',T^ 2 ). In step 2 we update the first core obtaining 
u = t(U (1 ',T^ 2 ). In step 3 we approximate the residual y — Au = z « z = t(Z [1) ,Z^ 2 ) 
and expand the first core by Zy>. These operations are numerically tractable, see Sec.|4]for 
details. Minimization problem in step 5 leads to the following linear system 

(x;ax 1 )x^ 2 = x;u, Xt =yux) = x (1) ®i n2 ... nd , x^ 2 = T(x (2 \...,x (d) ), (13) 

which has riU2 . . . n^ unknowns and is still too large to be solved directly. Note that Xi = 
X^ ® I is a rank-1 multilevel matrix, hence B = X^AXi has the same TT-ranks as A, but its 
TT-representation is shorter by one core. Similarly, X^u represents a smaller right-hand 
side in the TT format, and the solution x^ 2 is sought in the TT format as well. Therefore, 
the linear problem in d dimensions is reduced to the one in d — 1 dimensions, i.e. to the 
minimization over the remaining subtrain, and the same algorithm is applied recurrently. 
The convergence rate of AMEn is defined by a recurrent application of the result of 
Thm. [U We need to consider a sequence of the reduced problems 

A k x >k =y k , A k = X* <k AX <k , y k = X* <k y, X <k = y <k (X). (14) 

The initial guess is t^ k = x(T' k ',T^ k+1 ), the update step for the core T' k ' gives u k = 
T(U (k) , T^ k+1 ), further steps of AMEn return the solution x^ k = T(X (k) , . . . , X (d) ), and the 
true solution is defined by x^ k = A k u k . Similarly to Thm. [1] we have 

'\ K * u k|lA k ..2 ^ -, II** * llA k _ ,..2 i l C k> K J C kjA k 



||v^_ t >k||2 ~^k^'> ,, > k _ ||2 ~^k-' I, „2 > W 

ll x * x IU k ll x * u klU k ll L k|| A]c 

where c k = x^ k — u k , X = CPi (X (k) , . . . , X (d) ), and R^ denotes the A k -orthogonal projector 
to X. With these definitions in hand we write the following theorem. 

Theorem 2. The AMEn Alg. [1] converges globally with the following convergence rate 



^ = ^ 2 (cu 2 + (l-a) 2 )^(u) 2 + (l-u) 2 )^ 2 (u) 2 + ... + ^_ 1 a) 2 d _ 1 )---)) 



|X* X| 

l x * 

d-1 k-1 k (16) 

^cu 2 nn-^n^=* d - 

k=1 j=1 j=l 

8 



Proof. In two dimensions, the theorem reduces to Thm. [TJ which proves the base of re- 
cursion. We suppose that (pT6]> holds in d — 1 dimension and prove it recurrently for d 
dimensions. 

If t = t(T (1) , T >2 ) is the initial guess, u = t(U (1) , T >2 ) appears after the update of the 
first core, and x = t(X' 1 ', X^ 2 ) is the result returned by one iteration of AMEn, the error 
x* — x is written as follows 

x,-x = x 7t -T(X (1) ,Xf)+T(X (1) ,Xf)-T(X (1) ,X^ 2 ) = (^-X 1 xf)+X 1 (xf-x^ 2 ), 

where the first term is the error of the first ('outer') AMEn step provided the solution x^ 2 
of the reduced problem ((131) is computed exactly, and the second term is the error of other 
('inner') AMEn steps, which we find using the assumption of the recurrence. Using ((TTj) 
we show that these terms are A-orthogonal, 



x* - x = (I - R Xl ) x* + X ] (xf - x >2 ) , 

|x,-x||i = ||(I-R Xl )x,|| 2 v + ||X 1 (xf -x^" : 



(17) 



A 



The first term writes by Thm. [T] as follows, 

||(I-R Xl K||i ||(I-R Xl )(x*-u)|| A K-u||i ||(I-R Xl )(x*-u)|| 2 _ 2 2 



ii ii") ii iiT ii iiT ii n") 

v t * \W + \\ l he t r \W n 2 

II " 1 1 /A. II * 1 1 r\ II * 1 1 r\ II " 1 1 /A. 

For the second term we need the following norm equivalence, 

,^2 ^>2\\\ 2 __ l^-l „>2 'Y*A r r f~>2 ^>2\\ __ ||^>2 „>2\ 



\h*»v 



Xi [xf - x >2 ) || A = (xf - x >2 , Xf AX, {xf - x^ 2 )) 



x; 



where B = X| AXi . In the right-hand side we see the error norm of the AMEn algorithm 
applied to the linear problem (|T3)) with d— 1 cores. According to our assumption, it writes 
by ((16)) as follows, 



| v >2_ v >2||2 d-1 k-1 I.- 

I** K II B 



£_aZH{l-tf)Y[vl = 4&-u (18) 



|| x ^ 2 _0 2 || 2 

H X * X llB k=2 j=1 j=1 

and for the second term we obtain 

\\YJy> 2 — x> 2 lll 2 || Y > 2 _+>2||2 II <y f >2 _ + >2a ||2 

ll^i l x * * J Ha _ ,k2 ll x * t Hb _ ^2 ,,2 ll x U x * x JHa 

n 7TT2 ~~ ^d-l ii 7T|2 ~~ *rd-1 M-1 ii [7? ■ 

V — T V — T V — II 

1 1 A II A. 1 1 A 1 1 A. 11" 1 1 A. 

In the numerator we simplify Xix^ 2 = R Xl x*, cf. ((T0|), and 

X^> 2 = x[X^\t >2 )=x([U^ ZW] ^ ] =T(U( 1 \t^ 2 )=u = R Xl u. 

Finally, we write last part of the seconf term as follows, cf. ((TTj), 

||Xi(xf-t> 2 )|| A ||R Xl (x*-u)|| A 



1 P M p -l-O),. 

— 11 rv — 11 

i * Ha II * Ha 



Substituting both terms into (fTT) , we complete the proof by 



K-x|| A _ ||(I-R x Jx,||i giUf -x^ 2 ) 

1 iiT ii iiT ! ii iiT 

he — + 2 \\k — t 2 he — t |2 



^o)f + ^(l-cuf)4)i_ 1 =44 (19) 



IA ll'V* L ||A II /v * L HA 

where 4>d-i evaluates the convergence rate of the AMEn in d — 1 dimensions by ((T8)>. D 

9 



Remark 2. From the recurrence relation (fT9~l> it is clear that if the convergence of the re- 
duced problem 4> d _i < 1 , then the convergence rate of the full problem 



7 7 7 

4> d = M-tOJ! 



M-fO-cuf)^.^!. 



By Remark [H we can always choose the approximation threshold e to ensure 4>2 < 1 for 
the AMEn algorithm in two dimensions. Therefore, we can always choose e to provide 
4>d < 1 , which guarantees the global convergence of Alg.[TJ 



Similarly to Thm. [T] we can estimate cu k in (|T5) as follows, 
(c k , R x 



Uh 



1 



,k Ck)Ak < 1 _ (Ck, R z k C k ) Ak ^ ! (Ck> ^ k • k ./\, _ > 



\r II 2 
l C k|| A 



l C klli k 



^ 1 



)<A k > 
v z k 



Ck)/ 



Ir II 2 

l c TclU k 



= a v 



where £ k « z k = A k c k = y k - A k u k , z k = T(Z (k) ,Z^ k+1 ) and Z = y 1 (Z (k) ,Z^ k+1 ). In the 
right-hand side we see the convergence rate of the perturbed steepest descent method 
applied to the reduced problem with the matrix A k . It is estimated [9, Thm 1] as follows 



">z k = cu 2k 



0(e) 



cu Zk ^ - 



: (A k 



t (A k 



c(Aic) + A mln (A k 



= n(A k )<i, 



where e denotes the relative accuracy of 2 k . It can be shown that if all X <k are orthogonal 
we have 

A m in(A k ) > A min (A k _ 1 ), A max (A k ) < A max (A k _!), 

Amax(Aj — A min (A) 



Q( A d _! ) ^ Q( A d _ 2 ) < . . . < Q( A, ) = Q( A) 



A max (A)+A min (A)' 



(20) 



where the last term estimates the convergence rate of SD algorithm applied to Ax = y . 

Remark 3. The requirement for X <k to be orthogonal for k = 1 , . . . , d is equivalent to 
the TT-orthogonality of the tensor train X, see |24[ for details. As a counterpart of (|20)) , 
we may say that this requirement prevents the condition numbers of reduced matrices 
in (|T4|) from increasing, which is essential for numerical stability. By construction, X' k ' = 
|\jM Z 0<)J does not provide the TT-orthogonality of X. An additional step is required to 
recover the orthogonality, i.e. make the TT-core X' k ' column-orthogonal. It is done via a 
QR decomposition 



U M z 



(k) 







[ftM Z«] 



Ruu Ru 

Rz 



jClc+1) 





= [fJM Z< k >] 



R T (k+1) 

lx uu ' 





We denote the result after orthogonalization by the same symbol X' k) = X' k ' = [tl' k ' Z' k '] . 
All considerations in Theorem |2] remain valid, since all estimates are based on the sub- 
spaces, which are unaffected by the QR decomposition. Therefore, we imply the TT- 
orthogonality silently to simplify the discussion, and the actual operation is fast and does 
not influence the analysis. 

In [91 Thm. 3], the convergence rate of the greedy descent algorithm t + ALS(z) is given 
by exactly the same formula as (fT6)) , but the values cu k are defined differently. In the AMEn 
method, cu k is given by (|T5)> and relates to the convergence of the reduced problem ([M]) . 
For the algorithm t + ALS(z) it is defined as uj k = o>z, <k in terms of (0). Since span %<j k+ i e 
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span Z-sck/ for the greedy algorithm it holds cu k+ i ^ w\c- For the AMEn algorithm, we can 
prove this only for upper bounds 0(A k ) as shown in (|20)> . 
Considering the 'width' of Z^ and X^ k , we can expect that 

cu k (AMEn) < u> k (t + ALS(z)), MAMEn) ^ fj. k (t + ALS(z)), 

which is observed in numerical experiments. It is not clear however whether this heuristic 
statement holds in general. 

4 Fast approximation of the residual 

In this section we discuss how to compute the approximation on step 3 in AMEn Alg. [1] 
efficiently. 



4.1 SVD-based approximation 

In steepest descent schemes proposed in [9], the low-rank approximation of the residual is 
computed once per iteration, and a standard SVD-based TT-rounding procedure from [24] 
can be used. In AMEn Alg. [T]we can not approximate each z k individually by the TT-SVD, 
since it makes the total complexity quadratic in the dimension d. To keep the complexity 
linear in d, we have to investigate the tensor structure of z k . 



Looking at the TT representation of the reduced system (jT4|) and recalling that X <k = 
X <k g) I nk -n d has a rank-one structure, we conclude that u k = X< k ij inherits the blocks 
k + 1 , . . . , d from u as follows, 



TJkl 



(Ofc-iiic, • • • ,i d ) = Yi k Wii k )Y (,c+1] (iic + i) • • • Y (d) (id), 



where Y k = CP^X' 1 ', . . . , X' k ')Y^ k . The similar representation holds for the local matrix 



(k] 



A k = T(A^,A 



(k+1) 



A td| ). Therefore the local residual z k = u k — A k u k writes 

^ (k+1) (W 1 )--^ (d) (id), 



z k l<x k _i i k . . .Id 



2?' 



(X k _!l k J 



(21) 



Z k k) (a k _ii k ) = 



(k) 
*-k 



YrVntk) -K k] (a k ^i k} (3 k _!J k ) ® UW(3k-iJ! 



£vh 



^ (d) (id) = 



2,...,d-l, 



Y (p) (ip) 

Y (d) (id) 
A( d Hid,jd)®T( d '(j d ) 

The TT decomposition of the exact residual z k = t(Z{, , Z (k+1 ] , . . . , Z (d) ) has only one block 
which actually depends on the information gained in the step k, the others can be precom- 
puted before the iteration. In the approximate residual z k = x(Z k , . . . , Z k ) all blocks 
depend on the recently computed U tk \ but only one block Z k is actually required. This 
means that if we keep all TT-cores Z' p ' right-orthogonal, we can compute Z k by the SVD 
compression of Z k only. Therefore, the SVD-based approximation of the residual in- 
volves the information from only one core, and the complexity of each enrichment step 
does not grow with d. The overall complexity is therefore linear in d, as required. 
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4.2 Cholesky-based approximation 

The singular value decomposition provides the optimal approximation accuracy for a pre- 
scribed rank, but is numerically expensive. Each TT-core Z' k ' has the sizes R k _i xn^x R k , 
where R k = r k r k (A) + r k (u), and the QR and SVD operations have the complexity 0(R 3 ) = 
0(r 6 ), which may be inefficient. Since a very precise approximation of the residual is not 
always required, we may avoid expensive QR and SVD steps by considering the Unfin- 
ished Cholesky algorithm (see, e.g. [30|]) applied to the Gram matrix of the first unfolding 
of z k . A careful implementation allows to reduce the complexity to 0(r 5 ). 
Given ((21), its first unfolding reads 

Z k k} (^^,i k+1 , . . .,i d ) = Z^^k) • Z (k+1) (ik + i) • • -2 (d) (id), 

and the Gram matrix G k = Z k (Z k )* computes as follows, 

G k (^^, fr-fc) = r£\'Z^ y fr~fc) ■ r (k+1) • ■ ■ r (d) , 

where if ' fa^, fr^) = Zf'^Q ® Z k k) (|3^U and V^ = Z^(ip) ® 2 w (ip) for 
p = k + 1 , . . . , d. 

Similarly to the SVD-based method, we can precompute V^ before the iteration. The 
enrichment vectors are then calculated as the factors Z k in the Unfinished Cholesky de- 
composition 

G k (a k _ii k , |3 k _ij k ) sa Z k k) (cXk-ii-k)D ^Z k k) (|3 k _ij k ; 

4.3 ALS-based approximation 

To reduce the complexity even further, we can approximate z ~ z = y — Au using the 
auxiliary ALS iteration. We start from some low-rank initial guess z = t(Z) and minimize 
||z — z|| under the constraint z = 2^ k z (k \ where z {Y] = vec Z (k) . For a unitary Z^ k this leads 
to the extremal condition z (k) = Zy k z. 

Until the convergence of the fixed-rank ALS is not proved, this approach is heuristic. 
However, as was observed in numerical experiments, it provides the enrichment basis 
almost of the same quality as the SVD-based method, although much faster. It is enough 
to conduct two alternating methods simultaneously step by step, which means that only 
one ALS update is performed for z between the subsequent AMEn iterations. 

The algorithm is organized as follows. Given some low-rank approximation z, we as- 
sume that span(Z >k ) is a good approximation basis for z k as well (which appears to hold 
in practice). That is, the enrichment is computed as a projection 

vecZ k k) =IP;, 1 (Z (k \...,Z (d) )z k . (22) 

Now, we need to update z for the forthcoming iterations. The current solution approxi- 
mant is u = T(X <k , U (k) , T >k ), so the TT-core Z (k) writes 

z« = Z^fo " Au) = Tfyx - ^ k A^ k (U)u< k », 
U = (X^,...,X( k - 1 \u( k »,Tt k+1 »,...,T( d »). ( ' 

Note that Z (k) serves only as an update of the global residual approximation z and cannot 
be used as an enrichment directly. 
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Similarly to the previous sections, one may avoid 0(d 2 ) cost of (|23l) , (|22)> by performing 
all calculations involving the same TT blocks (e.g. z' p ') only once during the AMEn iter- 
ation. The resulting complexity is therefore that of the fixed-rank ALS, 0(pu 2 r 3 ), where 
p = r(z) is the TT-rank of z. In practice, it is usually enough to take p ^ 5. 

Finally, let us note that the minimization ||z — z\\ is equivalent to the maximization 
of (z, z). In other words, the AMEn method solves approximately the following minimax 
problem, 

maxmin(z, y — Ax) = maxmin(z, x* — x)a, 

Z X Z X 

by performing the subsequent ALS updates for z in the rank-p TT format, and x in the TT 
format with a varying rank r. This allows us to establish a connection between the AMEn 
method and the greedy approximations, in particular, the Minimax Proper Generalized 
Decomposition [22 J. However, the greedy techniques usually perform the optimization 
over rank-1 separable tensors. As a some improvement one may mention the orthogonal 
greedy method, which orthogonalizes the residual to the basis of R current canonical fac- 
tors of the solution (i.e. selects R scalars). The AMEn approach may be considered as a 
next milestone in the family of adaptive tensor-structured linear solvers. By updating a 
larger portion of solution data at a time, it appears to be more robust and accurate, as was 
demonstrated in Q. 

5 Numerical experiments 

In these experiments, we compare the MATLAB versions of AMEn algorithms proposed 
above (SVD, Choi, ALS) with the ALS(t+z) method from the previous work [9], as well as 
the DMRG method from |SJ. The AMEn and DMRG methods were implemented within 
the framework of the TT-Toolbo>o 2.2 (routines amen_solve2 and dmrg_solve3, respec- 
tively), and the computations were done at the Linux machine with 2.6 GHz AMD Opteron 
CPU, and MATLAB R2012a. 

5.1 SPD example: Poisson equation 

First, we consider the same symmetric positive definite example as in 0. This is the high- 
dimensional Poisson equation, 

-Ax = e, xgO = [0, l] d , x| 9Q = 0, 

where A is the finite difference Laplacian discretization on a uniform grid with 64 points 
in each direction, i.e., the total size of the system is 64 d , and e is the vector of all ones. 
Different greedy-type methods were compared in [9j, as well as the ALS(t + z] method. 
Now we focus on non-greedy techniques, including AMEn+SVD and AMEn+ALS, see Fig. 
|3j The TT-rank of the enrichment z is p = 4, Frobenius-norm threshold for the solution 
tol = 10 5 , and the problem dimension d = 16. 

We see that all methods except the DMRG demonstrate comparable performances. 
Even though in the beginning of the iterations the ALS(t + z) method seems to be the 
fastest, it approaches the same CPU times as the AMEn methods when the rank increases. 
In this example, the desired accuracy level is reached by all algorithms. However, it might 
be not the case, as we will see in the following. 



http : //github . com/oseledets/TT-Toolbox 
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Figure 3: A-norm error in different methods versus iterations (left), and CPU time (right). 
Poisson equation, d = 1 6 

5.2 Nonsymmetric example: Chemical Master Equation 

The second example is the Chemical Master Equation ||34|, applied to the d-dimensional 
cascade gene regulatory model |[T3~HT|. This is the huge-sized ODE 



dip(t) 

dt 



= Ai|)(t) G M n 



where \p(t) = {i|)(i, t)}, i = (it, . . . , i d ) e [0, . . . , 63]® d , so that n = 64, and the operator is 
formulated as follows, 

A = Ai+--- + A d , 
Aii|)(i,t) = cx -(^(i-ei,t)--4)(i,t)) + 6-((i l + l)t|)(i + e 1 ,t)-i 1 iKi,t)), 
(3ik-i 



A k i|)(i,t) 



(3iic-i + y 



(i|)(i - e k , t) - Tj)(i, t)) + 6 ■ ((iit + 1 )i|)(i + ei„ t) - i k x|>(i, t)) 



for k = 2, . . . , d, where e k e R d is the k-th identity vector. The particular model parameters 
were fixed to the values 

cc = 0.7, 8 = 0.07, (3 = 1, y = 5. 

The Chemical Master Equation serves as an accurate model for gene transcription, pro- 
tein production and other biological processes. However, its straightforward solution be- 
comes impossible rapidly with increasing number of species d. Existing techniques in- 
clude the Monte-Carlo-type methods (so-called SSA |TT| and its descendants), as well as 
more tensor-related ones: Sparse Grids ||13|, greedy approximations in the canonical ten- 
sor format [1J and tensor manifold dynamics |15|. The first two approaches only relax 
the curse of dimensionality to some extent; typical examples involve up to 10 dimensions 
and may take from 15 minutes to many hours on high-performance machines. Tensor- 
product low-rank approaches seem to be more promising. Unfortunately, we cannot esti- 
mate a possible potential of greedy or manifold dynamics methods, whereas up to now 
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Figure 4: Error (left) and CPU time (right) of the DMRG method vs. the dimension 

our alternating linear solution technique appears to be more efficient. For more inten- 
sive study of the CME applications of the AMEn and DMRG methods see |6] and [16J, 
respectively. Note that for systems with moderate dimensions and smaller time steps, 
the DMRG method can be of a good use for such problems, as was demonstrated in [16J. 
However, as we will see, the AMEn algorithm appears to perform better than DMRG for 
more complicated problems. 

Two specific tricks allow to take more benefits from the tensor structuring. First, we 
employ the Crank-Nicolson discretization in time, but instead of the step-by-step propa- 
gation, consider the time as a (d + 1 )-th variable and formulate one global system encap- 
sulating all time steps 0, 

(i-^a)\KWi) = (l + jAJMt m ), m = 0,...,N t -1, 

where t m = ttti, t is the time step size, and the initial state is i|>(0) = <S)k=i e i/ e i e ^ 64 i s 
the first identity vector. In particularly, we choose N t = 2 12 , T = 10, and t = T/N t . Such 
a time interval is not enough to reach the stationary solution, but the transient process is 
also of interest. As a result, we end up with a (d + 1 )-dimensional system of size n d • N t . 

Second, we prepare all the initial data and seek the solution not in the ( d+1 ) -dimensional 
TT-format directly, but in the so-called Quantized TT format |fT7|: we reshape additionally 
all tensors to the sizes 2 x 2 x ■ • • x 2, and apply the (dlog 2 (n) + log 2 (N t ))-dimensional 
TT decomposition, but with each mode size reduced to 2. 

However, the matrix is strongly nonsymmetric, which makes difficulties for the DMRG 
approach. We fix the truncation tolerance for the solution to tol = 10 5 , and track the 
Frobenius-norm error of the DMRG solution w.r.t. the reference one, obtained by the 
AMEn+SVD method with tolerance 1 8 , versus the dimension d, see Fig. |U Since the 
DMRG technique takes into account only local information on the system, its accuracy 
deteriorates rapidly with the increasing dimension. A stagnation in a local minimum is 
also reflected by a sharp drop of the CPU time, since the method skips the "converged" 
TT blocks. This makes the DMRG unreliable for high-dimensional problems, even if the 
QTT format allows to get rid of large mode sizes. 



15 




2 4 6 8 10 12 14 16 18 20 
log ]0 residual vs. iterations 



10 12 3 

log ]0 residual vs. log ]0 time, sec 




2 4 6 8 10 12 14 16 18 20 
log ]0 error vs. iterations 




■ dmrg 

■ dmrg-s 

■ als(t+z) 

■ als(t+z)-s 

amen 
. amen-s 
stop 



12 3 

log 10 error vs. log ]0 time, sec 



Figure 5: Residual (top) and error (bottom) in different methods vs. iterations (left), and 
CPU time (right), 20-dimensional problem 

Now, we fix the dimension d = 20, and compare both the error and residual accura- 
cies of all methods, as well as the computational times. In all cases, the Frobenius-norm 
tolerance was set to tol = 1 5 , and the enrichment rank to p = 4. 

First of all, since our methods are proven to converge in the SPD case, we shall examine 
both the initial and symmetrized systems (Fig. [5). A well-known way to treat a general 
problem via a symmetric method is the normal, or symmetrized formulation, A* Ax = 
A*y. However, both the condition number and the TT ranks of A* A are the squared ones 
of A, and this approach should be avoided when possible. 

Three particular techniques are considered: the DMRG method, the AMEn+SVD 
(marked as "amen" in Fig. [5) and the ALS(t + z] one. The symmetrized versions are 
denoted by the "s" tag. 

In addition, note that the convergence of the methods may be checked locally due to 
the zero total correction after the enrichment in Alg. [Q before recomputing the (k + 1 )-th 
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Figure 6: Error in different methods vs. 
dimensional problem 



iterations (left), and CPU time (right), 20- 



block, calculate the local residual provided by the previous solution T' k+1 '. If it is below 
the threshold tol for all k, the method may be considered as converged, and stopped. 
Occurrences of this fact are marked by red rectangles ("stop"). 

We observe that the symmetrization allows the DMRG method to converge at least 
to the accuracy 1 3 , but increases the CPU time by a factor greater than 100 due to the 
squaring of the TT ranks and condition number of the matrix. Contrarily, for the AMEn 
and ALS(t + z) methods the symmetrization is completely inefficient and redundant: de- 
spite pessimistic theoretical estimates, the nonsymmetric algorithms converge rapidly to 
an accurate solution approximation. 

Though the non-symmetrized methods may admit oscillations in the residual, the 
Frobenius-norm error threshold is almost satisfied inboth AMEn and ALS(t+z) methods. 
Nevertheless, the AMEn algorithm appears to be more accurate thanks to the enrichment 
update in each step. Also, its local stopping criterion is trustful: it fires just after the real 
error becomes smaller than the tolerance, which is not the case for other methods. 

Since both AMEn-type methods in this test exploit the SVD-based residual approx- 
imation, they demonstrate almost the same CPU times. However, using the additional 
techniques from Section [4] we can reduce the complexity while maintaining almost the 
same accuracy, see Fig. [61 While the AMEn+Chol method still operates with the exact 
residual, the AMEn+ALS only needs to compute scalar products of the true residual and 
its low-rank approximation, which makes it more efficient than the AMEn+SVD method, 
as well as the ALSft + z) one. 

Finally, we test the performance of the two AMEn realizations with respect to the en- 
richment rank p (TT-rank of z), see Fig. [7J As expected, the higher p is, the more accurate 
solution can be computed. On the other hand, it is not necessary to pick very large ranks, 
since the corresponding accuracy improvement does not overcome the significant increase 
in CPU time. 
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Figure 7: Error (left) and CPU time (right) in AMEn methods vs. enrichment rank, 20- 
dimensional problem 

5.3 Fokker-Planck equation for complex fluid dynamics 



qi 



qi 



qa-i 



qd 



Figure 8: Bead-spring model of a polymer in a fluid 

Another example of high-dimensional problems arising in the context of probability 
distribution modeling, is the Fokker-Planck equation (see e.g. ||27|). As a particular appli- 
cation, consider the 8-dimensional Fokker-Planck equation of the polymer micro-model 
arising in the non-Newtonian fluid dynamics [HEHGI- The polymer molecules in a so- 
lution are subject to the Brownian motion, and are often modeled as bead-spring chains 
(see Fig. |8j. The spring extensions, being the degrees of freedom of the dynamical system, 
become the coordinates in the Fokker-Planck equation. 

We consider the case of 4 two-dimensional finitely extensible nonlinear elastic (FENE) 
springs in the shear flow regime according to [2], 



3^Mq 



at 



-=Z V *- f KqiJ>(q, t) - X. Dtj (F,(q)iJ>(q, t) - V^ipCq, t)) j , (24) 



where q = (qi, ..., qs) = (qi,i> qi,2> •••> <\a,i) is the stacked spring extension vectors (q Pi k is 
the displacement of the p-th spring in the k-th direction), 



D = -tridiag{-1,2,-1}®I 2 = - 
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Figure 9: Components of the stress tensor vs. time 
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Figure 10: Accuracy of the stress tensor vs. time step and grid size 



is a spring interaction tensor, 

K = I 4 ® (V x u) =diag{1,1,1,1}< 
is a flow velocity gradient (shear flow case), and 



0.8 




Fj(q) 



1-Iqil7b' 



iqii 



qf,i + q?,z> b = 5 > 



is the FENE spring force. Note that the singularity in Fj limits the maximal length of a 
spring to \fb. Moreover, the probability density ip at the point |qj| = y/b (and any with 
larger modulus) is zero. Therefore, the domain shrinks to the product of balls B = M®^-. 
A quantity of interest is the average polymeric contribution to the stress tensor, 



d d„ 

t(t) = Y <qiFi(q) T -h) = y\ M* t)q i F i (q) T dq - d ■ I 2 , 
1=1 1=1 Jb 



(25) 



with the normalization assumption J ip (q, t) dq = 1 . 

To recast the problem domain into a hypercube, the polar coordinates are employed, 

q — > (ti , 0i , ..., r 4 , 4 ) G ( [0, v b) ® [0, 2n) ) . The discretization is done via the spectral 

elements method (see e.g. |33fV We will vary the number of spectral elements in each 
radial direction n r , but the number of angular elements (in 0t) is fixed to 2n r . With typical 
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Figure 11: CPU time vs. time step and grid size 

values n T ~ 20, we end up with tensors of size ~ 1 12 and dense populated matrices, which 
are intractable in the full format. Since the spectral differentiation matrices are found to 
be incompressible in the QTT format, the 8-dimensional TT representation is used. 

We would like to compute the stationary state of (|24[) , so we use the simple implicit 
Euler (inverse power) method as the time discretization, 



;M + AA)iKt m+1 )=MiKt T 



Am, ra = 0, ...,N t — 1, 



where M is the mass matrix, A is the stiffness matrix. The time integration was performed 
until T = AN t = 20, which is enough to approximate the steady state with a satisfactory 
accuracy, and the (unnormalized) initial state was chosen iKq, 0) = 0t = i — lqil 2 /b) b/2 , 
which corresponds to the zero velocity gradient K = 0. Since A is not a "time step" but a 
parameter of the inverse power method, we will check the performance w.r.t. A as well. 

In the previous example we have observed that the AMEn+SVD method is in fact su- 
perfluous, since the AMEn+ALS method delivers the same accuracy with lower cost. Both 
mode sizes (up to 48) and TT-ranks (up to 73) in this example are relatively large, so we 
will consider only the AMEn+ALS. We set the Frobenius-norm threshold to tol = 10 4 , 
and the enrichment rank p = 3. The initial guess for the AMEn+ALS method is taken 
from the previous Euler step. 

First, let us track the evolution of the stress tensor components ((251) versus Euler iter- 
ations, see Fig. HI We see that the stress does really stabilize in the chosen time range. 
Moreover, the last component tends to zero, and can therefore be used as an in-hand mea- 
sure of the accuracy. In addition, we compare t(1 , 1 ) and t(1 , 2) with the reference values 
computed with n T = 28 and tol = 1 0~ 5 , see Fig. [101 For all A except 1 (which is too large), 
and n T = 24, the accuracy attained is of the order 1 0~ 4 + 1 0~ 3 . Note that typical accuracies 
of greedy or MC methods for many-spring models are of the order 1 _1 f2"l!35||. 

Finally, the computational times can be seen in Fig. [TTJ As expected, the complexity 
increases quadratically with the number of spectral elements n T . An interesting feature 
is that the total CPU time decays with increasing A. It points out that the performance of 
the AMEn method depends weakly on A, and henceforth on the matrix spectrum. On the 
contrary, the quality of the initial guess (in terms of both ranks and accuracy) is crucial. 
This may motivate attempts to relate the AMEn methods to Newton or Krylov iterations 
in a future research. 
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6 Conclusion 

In this paper we develop a new version of the fast rank-adaptive solver for tensor-structured 
symmetric positive definite linear systems in higher dimensions. Similarly to the algo- 
rithms from [9], the proposed AMEn method combines the one-dimensional local updates 
with the steps where the basis is expanded using the information about the global residual 
of the high-dimensional problem. However, in AMEn the same steps are ordered in such 
a way that only one or two neighboring cores are modified at once. Both methods from [9] 
and the AMEn converge globally, and the convergence rate is established w.r.t. the one of 
the steepest descent algorithm. The practical convergence in the numerical experiments 
is significantly faster than the theoretical estimate. The AMEn algorithm appears to be 
more accurate in practical computations than the previously known methods, especially 
if local problems are solved roughly. 

The asymptotic complexity of the AMEn is linear in the dimension and mode size, sim- 
ilarly to the algorithms from 0. The complexity w.r.t. the rank parameter is sufficiently 
improved taking into the account that a limiting step is the approximation of the residual, 
where the high accuracy is not always essential for the convergence of the whole method. 
We propose several cheaper alternatives to the SVD-based TT-approximation, namely the 
Cholesky decomposition and the inner ALS algorithm. The ALS approach provides a 
significant speedup, while maintaining almost the same convergence of the algorithm. 

Finally, we apply the developed AMEn algorithm to general (non-SPD) systems, which 
arise from high-dimensional Fokker-Planck and chemical master equations. Theoreti- 
cal convergence analysis can be made similarly to the FOM method, which is rather pes- 
simistic and puts very strong requirements on the matrix spectrum. In numerical experi- 
ments we observe a surprisingly fast convergence, even for strongly non-symmetric sys- 
tems. Here the AMEn demonstrates a significant advantage over the DMRG technique, 
which is known to stagnate, especially in high dimensions, see |{23l 13 and Fig. |5l 

There are many directions of a further research based on the ideas of [9] and this paper. 

First, the ideas developed in this paper can be generalized to other problems, e.g. find- 
ing the ground state of a many-body quantum system or a particular state close to a pre- 
scribed energy. The combination of update and basis enrichment steps looks very promis- 
ing for a wide class of problems, as soon as the corresponding classical iterative algorithms 
can be adapted to provide a proper basis expansion in higher dimensions. A huge work 
is done in the community of greedy approximation methods, where the cornerstone is a 
subsequent rank-one update of the solution. 

Second, there is a certain mismatch between the theoretical convergence estimates, 
which are at the level of the one-step steepest descent algorithm, and the practical conver- 
gence pattern, which looks more like the one of the GMRES. This indicates that there are 
further possibilities to improve our understanding of the convergence of the AMEn and 
similar methods. Our rates can benefit from sharp estimates of the progress of the one- 
dimensional update steps, which at the moment are available only in a small vicinity of 
a true solution, which is hard to satisfy in practice, see [28 J. The superlinear convergence 
observed in numerical experiments inspires us to look for possible connections with the 
theory of Krylov-type iterative methods and a family of Newton methods. 

Finally, we look forward to solving more high-dimensional problems, and are sure 
that they will bring new understanding of the advantages and drawbacks of the proposed 
method, and new questions and directions for a future research. 
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A FOM theory 

As was observed in the numerical experiments, the AMEn method works successfully 
even being applied directly to non-symmetric systems. Though we cannot support this 
behavior with sharp estimates, one may proceed similarly to Section |3l and establish a 
formal theory, relating the AMEn to the Full Orthogonalization Method. 

A.l Galerkin projection and angles between subspaces 

Like in the SPD case, we begin the analysis from the two-dimensional case. Given a linear 
system Ax = y and some basis V, the projection method is performed as follows, 

x = Vw = V(V*AVr 1 V*y, y - Ax = (I - AV{V*AV)- ] V*)y. (26) 

Given an initial guess t, we assume t e span( V), and z = y — At e span( V). Then it holds 
also y - Ax = (I - AV^AV)- 1 V*)z. 

So, (|2~6"1> performs an oblique projection of the residual. Its analysis is often conducted 
with the help of the orthogonal projection, 

((AV)*AV)w = (AV)*z, y - Ax = (I - AV(V*A*AV)- 1 (AV)*)z = (I - P A v)z, 

i.e. the residual minimization on V. The case V = z is known as the MINRES method. Its 
convergence was analysed in e.g. [29J, 

ll a II • ra -mi II (1 ^ I(AZ,Z)| 

\\y — Ax = sin Az, z z , cos Az, z = 77- — rr^— 7-, 

II v 7 ^ II II ' v ) J AtT 

/A.Z Z 

i.e. ( Az, z) is the acute angle between Az and z. The worst convergence rate is estimated 

as 

l(Az,z)| 



o>mr =maxsin(Az, z), Wl-OJ^ R =min , 

z V z ^0 AZ I Z 

and for a positive definite matrix is guaranteed to be less than 1. The same approach may 
be used for the block case as well, 

\\y — Ax\\ = sin(AV, zj ||z|| , cos[ AV, z) = max ■ 



q^o ||AVq||||z|| 

Obviously, if z 6 span(V), it holds sin(AV, z) ^ sin(Az, z) < 1. 

Unfortunately, for the oblique projection (|26"1> one cannot guarantee the monotonous 
convergence in general. However, assuming a certain well-conditioning of the system, we 
may relate the old and new residuals by a factor smaller than 1 as well. 

Lemma 1. Given a column-orthogonal matrix V, initial guess t G span(V). Assume the 
smallest eigenvalue \i = A min (V*AV + V*A*V)/2 > 0, and ||z - W*z|| ^ e||z||. Denote 
ii^Tjl = y/1 — ujy. Then, the progress of (|26)) is bounded by 

/I -OJt, 
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Proof. First of all, notice that u — Ax is orthogonal to V, 

y-Ax = (W*)(I-AV(V*AV)- 1 V*)z+(I-W*)(I-AV(V*AV)- 1 V*)z 
= (I-W*)z-(I-W*)AV(V*AV)- 1 V*z. 

Then, \\y — Ax|| ^ e||z|| +sin(V, AVw) ||AVw||, where w = (V*AV) _1 V*z. For the angle we 
can derive the following chain of inequalities, 



cos(V, AVw) = max^^^min^M 

V ' ' || q ||=i H AVw ll w^O H AVw ll 

^ . ||w*V*AVw|| ^ . ||w*V*AVw|| 

^ min " ,,.,, n " > min - — ^11 — -, 

11 11 1 AVw 'ii n -, AV ' 

||w||=l ' ' ll w ll = l 

from which we get sin(V, AVw) ^ wy- On the other hand, 

. ||w*V*AVw|| A min ((V*AV + V*A*V)/2) ^ g min (V*AV) 

IHN ||AV|| " ||AV|| " ||AV|| ' 

so that — n/Jli/i ^ / ] i • Therefore, the residual estimates as follows, 

<WV*AV) ^ ^/l-cu^ 

1 iii* 11 ^ <^V 



(27) 



sin(V, AVw)||AVw|| ^ cu v ||AV|| — , A ,J |V*z|| < v V1 - e 2 ||z 



a min (V*AV) M " ^ yfl^r 



D 



Remark 4. It holds 



: — |z*Az| ||w*V*AVw|| r^r^Tr ^ 

I — <i>t,p = min -7— — fT- ^ min — — — - — n — ^ cos V, AVw , 

MR ||z||=i ||Az|| || W ||=1 ||AVw|| 

since the minimization over Vw is a restriction w.r.t. the minimization over z in the full 
space. Hence, sin(V, AVw) ^ cu MR . However, sin(V, AVw)/^/T^uJy ^ tan(V, AVw) 
might be greater than u>mr, and even greater than 1. 

Remark 5. If V contains the m-th Krylov subspace, we obtain the so-called FOM method. 
The progress of the FOM can be related to that of the GMRES as follows ||29|, 



F _ ">m 



m -, 



1 -«/<_,)' 

where cu k , cu£ are the progresses of the k-step GMRES and FOM, resp. Note the similar 
term uj v /\/1 — ivy in Lemma [TJ 



Remark 6. The condition \\z — W*z|| ^ e||z|| may reflect the residual approximation, i.e. 
z^zG span(V),butz ^ span(V). Both SVD- and ALS-based approximations (see Section 
S) fit to this scheme: the SVD approximation reads z = UU*z, where U is the singular 
vectors, and the ALS approximation reads z = Z^Z^z. 
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A.2 Recurrent residual accumulation 

Lemma Q] applies immediately to the two-dimensional AMEn method, by setting V = Xi . 
Despite the generally pessimistic estimate, it occurs in practice that X^f AXi is nonsingular, 
and moreover, cux, is rather small such that \\y — Ax|| < ||z|| and converges rapidly 

A nice property of Theorem |2] is that it itself does not rely on a particular form of X k . 
We only needed that the Galerkin conditions A k+1 x >k — u k+ i make the error ||x^ k — x^ k || Ak 
strictly smaller than ||x^ k — u >k || Ak . 

Here, we write the similar result in terms of residuals. 

Lemma 2. Suppose in the k-th step of the multidimensional AMEn method, the ALS step 
provides the residual decrease 

||y k - A k u >k || = |x k ||u k - A k t >k ||, |x k < 1, 

and the exact computation of the rest cores x> k = A^ijk+i after the enrichment provides 
the residual decrease 

||-y k - A k x^ k || = o) k ||u k -A k u >k ||, o) k < 1, 

where cu k ^ tuj k from Lemma Q] with X k = 7] (X' k \ . . . , X' d '). Then, the total convergence 
rate of the AMEn method is bounded by 

iiu - Ax ii < Y- cUk ^ k 1 1 /-, m 2 ■ " y ~ At " • 

k=l m=1 V ] ~~ ^m 

Proof. As previously, we assume that X' d ' is computed exactly Then, 

II,, a „>d— 1 || ... II-. a ii^d— 1 || 

MUd — 1 — A d-1* II — <^d-l||yd-l — A d-l u II- 

The base of the recursion is proved. 

Suppose the theorem holds for A 2 x >2 = y 2 , i-e. 

d-l k-1 

\\y 2 -A 2 x^\\ ^ Y_ ^icWcIl "Tf^T ' ^2 " A 2 t >2 || = 0||u 2 - A 2 t^ 2 ||, (28) 

k=2 m=2 V 1 W m 

and write the total progress for the whole system. We have 

u-At(X (1) ,X^ 2 ) =-y-AT(X (1) ,Xf)+AT(X (1) ,Xf)-AT(X (1) ,X> 2 ). (29) 

The exact solution for the second block is the oblique projection (|26)> , hence 
y - At(X (1) , X^ 2 ) = (I - AXt (X;AXt T 1 X*)y. 
The last two terms in (|29|) are similar to that in Theorem 12. 

At(X (1) , Xf) - At(X (1) , X> 2 ) = AX, (xf - x^ 2 ), 

but now it is not orthogonal to (I — AX] (X^AX] ) _1 X*). Therefore, we can only use the 
triangle inequality, 



\y - At(X (1) , X^ 2 ) || ^ ||(I - AXt (XfAXi ^XfHj \\ + ||AXt (x 
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>,2 _ x ^ 



The first term is the residual after the Galerkin solution, which is bounded by cui \\y — Au|| . 
For the second term, we have the recursion assumption (f28)> , that is 

||XfAXi(x* 2 -x >2 )|| <fl||X?AXi(x5f 2 -t> 2 )||. 

However, the only way to relate ||Aw|| and ||X*Aw|| is to use the angle between X] and 
AX] , employing (f27[) . 



\\x;AXi{xf -x >2 )\\ > yi -u^HAX^xf -x >2 ) 

\\AX,{xf -x> 2 )\\ < JL- WX^AX^xf-e 2 ) 

V 1 -">i 

Since X^AX^ 2 = Xfy, and Xit^ 2 = u, it holds 

HAX^xf-x^JlK ^^=||Xtz|| ^ ° " 

II ■ V x / 1 1 ^* P-t y II I ! ! 

Therefore, for the total residual we have 



1-o> 2 ^ vT=7^' 



||y-AT(X^,X^ 2 )K U 1+ -=l=aj||u-Au||. 
Plugging in the ALS update, the final estimate for ((29)> now writes as follows, 
||u-At(X (1 \X> 2 )|| ^ w U 1+ — =L=aj ||y-At||, 

which finishes the recursion. □ 

Contrarily to the symmetric positive definite case, where the total progress of the 
AMEn method was deteriorating with d, but less than 1 in any case, here we may have a 
situation when the progress bound given by Lemma El is greater than 1. Up to this mo- 
ment, the only available estimate is cu k ^ w\, since we enrich the basis by Z k , i.e. the first 
Krylov vector only. In principle, it is possible to include a larger approximate Krylov basis 
into the enrichment, i.e. 

2< k ) Q[1](k) . . . Q[m-1](k) 

where q [p] = T(Q [ P ](k) , . . . , QW) « A£z k , p = 1,...,m-1. However, this was not found 
to be reasonable in practical experiments. In all considered cases, the decays w k and \x^ 
provided by the single enrichment Z k appeared to be sufficiently small to ensure the 
convergence, fast enough to overcome the work required to prepare several Krylov vectors. 
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